A conditional random field based feature learning framework for battery capacity prediction

This paper proposes a network model framework based on long and short-term memory (LSTM) and conditional random field (CRF) to promote Li-ion battery capacity prediction results. The model uses LSTM to extract temporal features from the data and CRF to build a transfer matrix to enhance temporal feature learning for long serialization prediction of lithium battery feature sequence data. The NASA PCOE lithium battery dataset is selected for the experiments, and control tests on LSTM temporal feature extraction modules, including recurrent neural network (RNN), gated recurrent unit (GRU), bi-directional gated recurrent unit (BiGRU) and bi-directional long and short term memory (BiLSTM) networks, are designed to test the adaptability of the CRF method to different temporal feature extraction modules. Compared with previous Li-ion battery capacity prediction methods, the network model framework proposed in this paper achieves better prediction results in terms of root mean square error (RMSE) and mean absolute percentage error (MAPE) metrics.

substances during the capacity decay of lithium ions by analyzing the detailed internal electrochemical reaction process and reaction intensity during the aging process of the battery. However, the electrochemical system is more complex and the characteristic parameters are coupled with each other, which makes its dynamic prediction accuracy poor and difficult to achieve wide applicability.
The data-driven model establishes the mapping relationship between the characteristic parameters and the health condition from the overall level of the data by extracting the characteristic values of the measured parameters 18,19 . Depending on the data mining methods, they are mainly divided into statistical filtering methods, support vector methods, neural network methods, and fusion methods. Statistical filtering is a method to extract and reproduce valid signals and waveforms from data containing a large number of noisy signals, and the best weighting factor with a strong target following ability is automatically calculated and determined by a recursive linear data processing algorithm 20,21 . He et al. 22 used the extended Kalman filtering (EKF) algorithm to estimate the unknown parameters in the time degradation parameter model of lithium-ion battery capacity and obtained the future phase prediction results of the degradation trend of the remaining battery capacity. Support vector machine (SVM) as a nonlinear data analysis method, can not only provide relatively accurate estimation and prediction results with a small amount of data but also improve the data quality to a certain extent and overcome the drawback that the model falls into the optimal local extremes. Fewer unknown parameters and high sparsity are the characteristics of this method 23,24 . Zhang et al. 25 used to improve the prediction performance and operational efficiency of the battery by optimizing the relevance vector machine (RVM), by optimizing the RVM. Gao et al. 26 proposed a single radial basis kernel function based on the novel multicore SVM based on polynomial kernel and radial basis kernel function for predicting the remaining useful life (RUL) of Li-ion batteries, which has better prediction accuracy and stronger generalization ability compared to SVM while reducing training time and computational complexity. A neural network is a nonlinear prediction method composed of many neurons according to certain rules. The neurons contained in the network model are trained to connect weights and thresholds to build an accurate estimation and prediction model 27,28 . Increasing the depth of the neural network can approach any nonlinear mapping with a simple structure and high learning ability 29,30 . Neural network approaches mainly include artificial neural networks (ANN), convolutional neural networks (CNN), back propagation neural networks (BP), gated recurrent units (GRU), and long short-term memory networks (LSTM). Zhang et al. 31 used LSTM and RNN networks to capture the long-standing relationship between lithium battery capacity degradation for prediction. Fan et al. 32 proposed a GRU-CNN network for learning shared information and time dependence of charging profiles, including characteristic variation profiles such as voltage, current, and temperature, for estimating SOH. Zhou et al. 33 improved the prediction accuracy of the model by capturing the local capacity regeneration phenomenon generated by the battery during charging and discharging through time convolutional networks (TCN). The fusion method is based on the characteristics of different algorithms, each taking their strengths for fusion improvement, which not only ensures the accuracy of the predicted data but also provides an accurate assessment of the prediction uncertainty. Liu et al. 34 proposed a fusion algorithm based on least squares support vector regression (LSSVR) and hidden markov model (HMM) to predict the health status of rolling bearings, where LSSVR was used to predict the feature signal, and HMM was used to identify state features. Hong et al. 35 proposed a fusion estimation method for SOH of lithium-ion batteries based on capacity incremental analysis and a weighted Kalman filter algorithm, which has higher prediction accuracy compared to the common Kalman filter method. The recent Li-ion battery capacity prediction models are detailed in Table 1.
To improve the accuracy of lithium battery capacity prediction, this paper proposes a Li-battery capacity prediction model with CRF as the core. CRF is a discriminative probabilistic model about the temporal sequence, which is widely used in natural language processing (NLP) 46,47 . CRF constructs the state transfer matrix by the trend of the changing relationship of the neighboring labels and obtains the probability distribution of the prediction sequence by reverse decoding, where the state sequence with the highest probability is the optimal prediction (1) The CRF method is attempted to be introduced in the capacity prediction problem to calculate the observed state of the capacity prediction sequence by the offset matrix of the feature data, which more intuitively reflects the change of the capacity decline trend. (2) To improve the prediction accuracy of the CRF model, the study incorporates a CNN convolution module for collecting feature data at different time scales and an RNN time-linked module for capturing the changing trend of feature data on the before-and-after time difference and extracting its time-series relationship information. To verify the fit of the CRF prediction model to different time-linked modules, the study added GRU, LSTM, BiLSTM, and other control experiments, and the experimental results on the NASA lithium battery dataset showed that LSTM achieved better results.

Methodology
Overall framework of model. Lithium-ion battery residual life prediction is based on the analysis and processing of lithium battery use data to estimate the residual life of the battery. This paper studies how to make the prediction results more accurate and improve the robustness of the model. Since the test time points of each charge and discharge cycle are different, the test number of one cycle at the maximum collection point in the data set is taken as the standard, and the zero vector is used to supplement the insufficient ones. The collected data is first trained through the CNN model of the convolutional window, and then the extracted feature information vector containing the timing relationship is output to the LSTM network for training. After training, a complete implicit state sequence is obtained, namely the vector containing the timing sequence feature information of the charging-discharge cycle. Because the CRF has a good effect on time-series prediction, the vector with time series feature information trained by LSTM is input into CRF model, and the final prediction result is obtained by CRF.The overall framework of the model is shown in Fig. 1.

CNN network.
The CNN module mainly uses the convolutional layer in the convolutional neural network to capture the local features of the data, and uses a variety of different convolutional cores to carry out the convolution operation. Then, the Maxpooling operation is used to further extract the most effective features of the local features, while reducing overfitting. Then, the vector of local features of battery test data containing timeseries relationship obtained after convolution and pooling is fused to obtain more effective feature information The CNN model established in this paper is shown in Fig. 2.  www.nature.com/scientificreports/ (1) Input layer: This layer is mainly used to receive the initial battery characteristic data. The feature data matrix R is obtained by two-dimensional reconstruction of multi-feature timing 49 series test data. As shown in Eq. (1), R is connected to the CNN model as the input layer matrix.
where m represents the dimension selected for construction, t represents the time dimension, f represents the characteristic dimension, and x n represents the battery data measured in the NTH charge-discharge cycle.
(2) Convolutional layer: This layer can use different sizes of convolution windows to perform convolution operations. The parameters of the convolutional neural network are stored in the weight matrix and the bias matrix. The initial value is randomly generated and changed through training. Due to the difference in the size of the convolution kernel, through the convolution operation, various forms of local features can be extracted, as shown in Eq. (2).
Among them, a is the weight, c is the convolution vector matrix to be calculated, b is the bias, and f selects the ReLU activation function.
For all neurons in the next layer, they are calculated by the convolution kernel of the previous layer, so they represent the characteristics of the neurons in the previous layer detected from different positions. Since multiple convolution kernels are used in the CNN module to calculate the feature mapping matrix of the next layer, multiple feature mapping matrices G w of the next layer are obtained, where w represents the type of convolution window size used, that is, the final CNN integrated Number.
(3) Pooling layer: This layer validates the information extracted from the convolutional layer matrix through maxpooling operation to obtain multiple feature mapping matrices P w ; then the pooled multiple feature matrices are compressed into a feature matrix P , this process is called It is CNN integration. The integration formula is shown in formula (3). The dimensions of the compressed matrix rows are the same as the initial input X matrix, but the data in this matrix can express more characteristic information.
In this way, the input word vector is subjected to multi-layer convolution and pooling operations, then an eigenvector matrix containing the timing relationship is obtained, finally this matrix is used as the input of the next layer of LSTM model. LSTM network. The second layer of the model is the LSTM layer, which is used to deal with timing features.
The core of LSTM has a four-layer structure, which mainly contains three gates (forgetting gate, input gate, output gate) and a memory unit. The LSTM network model is shown in Fig. 3.
LSTM uses the forget gate to determine what information can pass through the state unit. The forget gate determines how much information can pass through at the previous time based on the output h t−1 and the current input x t at the previous time. The calculation of f t is shown in Eq. (4).
Through the input gate to generate new information that needs to be updated. This step consists of two parts: the first part is to determine the value for updating i t obtained from the input gate; the second part is to use the Tanh layer to generate a new candidate value C t , which is added as the candidate value generated by the current layer To the state unit. Then combine the values generated by the two parts to update.The calculations of i t and C t are as follows: The last step is to determine the output of the model. First, get an initial output through the sigmoid layer, and then use Tanh to scale the ct value to −1 to 1, and then multiply the output from the sigmoid to get the output of the model.
where σ is the sigmoid activation function; tanh represents the hyperbolic tangent activation function; CRF network. In the prediction task, LSTM is good at processing long-term series of test data, but it cannot coordinate the dependence between adjacent results of time series data, especially in the face of battery capacity regeneration. CRF can obtain an optimal prediction result through the relationship of neighboring data, and make up for the shortcomings of LSTM. For any sequence X = (x 1 , x 2 , · · · , x n ) , assume that p is the output matrix of the LSTM, and the size of p is n * k , where n is the time series prediction step size, k is the measurement feature information, and p ij represents the jth measurement of the i-th time point word feature. For the prediction sequence Y = (y 1 , y 2 , · · · , y n ) , the score function to get it is: A represents the transition score matrix, A represents the score which the predicted value i is transferred to j , and the probability of the predicted sequence Y is: Take the logarithm at both ends to get the likelihood function of the predicted sequence: In the formula, Ỹ represents the real labeling sequence, and Y X represents all possible labeling sequences. The output sequence with the largest score after decoding: The CRF model is shown in Fig. 4.

Experiment
Description of lithium-ion battery datasets. The data used in the experiment came from the NASA PCOE lithium-ion battery data set 48 . A set of four Li-ion batteries (B05, B06, B07, and B18) were run through 3 different operational profiles (charge, discharge and impedance) at room temperature. Charging was carried out in a constant current mode at 1.5A until the battery voltage reached 4.2 V and then continued in a constant voltage mode until the charge current dropped to 20 mA. Discharge was carried out at a constant current level of where X f is the all readings of sensor f on all units, ε denotes a positive number that tends to 0 infinitely, preventing the case where the denominator is 0.
Datasets segmentation. In order to verify the generalizability of the prediction results of this framework, three sets of data are randomly selected from four battery datasets as the training set and another set as the validation and test set. Figure 6 details the overall process of datasets partitioning. Parameter configuration. The correct choice of network model parameters often affects the prediction results. The experiment sets the parameters of step size of the predicted time series, the number of neurons in the network layer, learning rate and batch_size as hyperparameters, and the detailed data are shown in Table 2. The ReLU activation function is selected in the convolutional layer, the linear activation function is selected in the fully connected layer, and the marginal learning mode is selected in the CRF.
To obtain the hyperparameters suitable for the network model faster, particle swarm optimization (PSO) was experimentally chosen as the parameter optimization algorithm. PSO is a swarm intelligence algorithm for finding optimal parameters, which is often used in the parameter finding the process of network models in battery prediction problems 49,50 . PSO completes the search process by the individual search for optimal values and population information sharing, and the Fig. 7 shows the parameter optimization process of the particle swarm algorithm in detail.
In the experiment is divided into the following specific steps: www.nature.com/scientificreports/ (1) Parameter initialization. Set the number of particles n = 10 , the particle size D as the number of parameters to be optimized 10, the learning factor of particle update c 1 = 1 and c 2 = 0.5 , the number of iterations M = 100 , and the inertia weight parameter w = 0.8 . Randomly generate the initial velocity information v ij and position information x ij of the particle.  Figure 6. Partition of battery datasets. www.nature.com/scientificreports/ (2) The mean square error of the prediction result is used as the objective function of the particles, and the calculation formula is as follows.
(3) Calculate and update the current optimal solution p i and the global optimal solution p ′ i obtained from the particle calculation of the current iteration number. (4) Update the velocity and position information of the particle, and update the formula as follows: where r 1 and r 2 is taken as a uniform random number in the range of [0-1], so that the particle swarm algorithm has the ability to search randomly to avoid falling into local optimum. The results of the PSO optimization parameters are shown in Table 3.

Evaluation metrics.
To quantify the forecast results for comparison and analysis. RMSE and MAPE are used to evaluate the performance of the model in this paper 51 . The calculation methods of each evaluation index are as follows: In the formula, N is the total number of measurements predicted by the model. The results were averaged over several experiments.

Experimental results and discussion
Results of time-linked module control experiment. To test the effect of different temporal association modules on the prediction accuracy in CRF models, control experiments of LSTM, GRU, and BiLSTM were designed. The temporal association module compares the difference between using a single-layer CNN network to extract feature information as network input and using a two-layer CNN network. The output of the temporal association module is used as the input of the CRF model, and the experiments are done on B18, and the experimental results are shown in Table 4.
The experimental results surface that the prediction error of LSTM is smaller compared with other RNNs, which indicates that the temporal information extracted by LSTM is more adapted to the input with the CRF network. Compared with single-layer CNN networks, two-layer CNN networks can obtain better results. We found that this is because the two-layer CNN changes the length of the input to the network model using the pooling layer compared to the single-layer CNN, which enables the second layer CNN to extract a wider range of feature information.  Table 5.
The experimental results found that CRF could improve the accuracy of network model prediction, and the RMSE and MAPE evaluation metrics on four datasets B05, B06, B07, and B18 improved by more than 20% on average compared with no CRF, with the MAPE metric of B06 dataset improving by 53% as the largest improvement of the experiment, which indicated the importance of CRF model, which was also proved on the subsequent experiments of the probability distribution of prediction results.

Results of capacity prediction.
To intuitively reflect the prediction results of this method, Fig. 8 shows in detail the original measurement capacity and model prediction capacity of battery data sets B05, B06, B07, and B18(threshold value of precision region α is ± 2.5%).
The results from the figure show that most of the predicted results are within the error range of the true capacity. Of course, the predicted values at 78 cycles on the B05, B06, and B07 data sets show poorer prediction results, which is due to the capacity rebound caused by the capacity regeneration phenomenon during the charging and discharging process of Li-ion batteries, and the sudden change in capacity will bring a larger prediction error compared to the smooth state. This is because we use the complete battery dataset for model training to predict a different battery dataset, and the CRF model uses the feature offset matrix during the training process to count the overall trend of the training data and record the overall trend of the battery capacity so that when there is a large error, the error will be reduced in the subsequent prediction process according to the learning record. The error is reduced in the subsequent prediction process based on the learning records.
Comparison of the previous model. In order to verify the prediction superiority of the CNN-LSTM-CRF model proposed in this paper, comparative experiments were conducted with SVM, LSTM, and GRU models. The RMSE and MAPE results of the four models are compared in Table 6, which can more intuitively show the prediction accuracy of this algorithm.
It can be found from the table that for RMSE and MAPE metrics, the average metrics of the predicted value of the model in this paper are superior to the comparison model, illustrating the feasibility of the CNN-LSTM-CRF model proposed in this paper in the battery capacity prediction problem.

Conclusion
For the problem of lithium battery capacity prediction, this paper takes inspiration from the field of NLP and proposes a combined CNN-LSTM-CRF neural network prediction model, which is applied to the battery remaining life prediction for the first time. The model inputs continuous-time battery measurement data and predicts the output battery capacity situation at the current time point to obtain the remaining battery life at this time. Compared with the previous battery capacity prediction network model, the major difference in this model is the inclusion of CRF. The capacity prediction sequence is calculated by the offset matrix of the feature data, which more intuitively reflects the change of the decreasing trend of capacity. The CNN convolutional module is added to the model to collect the feature data, and the time-linked module captures the trend of feature data in the time dimension to extract the temporal information. Among them, LSTM achieves better results in the time-linked module control experiments. The ablation experiments demonstrate the effectiveness of the CRF network in the capacity prediction process. By comparing with previous models, our model achieves better prediction results. www.nature.com/scientificreports/ Our model still has flaws. The large number of network structures combined makes the network depth and computation of the model huge, which will cost more computational resources and time. Future work can try to experiment with migration learning in the model learning process, and use the extracted trained network parameters to adjust the network model to make the real-time prediction of the model possible.